Setup

library(tidyverse)
library(tidymodels)

theme_set(theme_minimal())
spotify_songs <- read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv') %>% 
  mutate(date = lubridate::ymd(track_album_release_date)) %>% 
  mutate(year = lubridate::year(date))

spotify_songs

EDA (exploratory data analysis)

We start of by counting things. How many genres are there? How much data do we have on each of those?

spotify_songs %>% 
  count(playlist_genre)

How many subgenres?

spotify_songs %>% 
  count(playlist_genre, playlist_subgenre)

Next we jump into exploring the distributions of different features.

spotify_songs %>% 
  ggplot(aes(valence, fill = playlist_genre)) +
  geom_density(alpha = 0.3) +
  scale_y_continuous(expand = expansion(mult = c(0, 0.1)))

By breaking with the tidy data principle in a clever way we can visualize the distributions for all our numerical features.

spotify_songs %>% 
  pivot_longer(where(is.numeric)) %>% 
  ggplot(aes(value, fill = playlist_genre)) +
  geom_density(alpha = 0.2) +
  facet_wrap(~ name, scales = "free") +
  guides(fill = guide_legend(title = "Genre")) +
  theme(legend.position = c(1, 0),
        legend.justification = c(1, 0),
        legend.direction = "horizontal")

Rock music has to step up it’s dancing game, I suppose. And the is a favorite tempo for EDM.

It is often interesting to explore changes over time.

spotify_songs %>% 
  ggplot(aes(year, loudness)) +
  geom_point() +
  geom_smooth()

spotify_songs %>% 
  ggplot(aes(year, track_popularity)) +
  geom_point() +
  geom_smooth()

All of this is an iterative process, so if we find a plot we find interesting we can spend some more time making it pretty by adding labels, titles colors and theming. This process can take as long as you want, sometimes you have to experiment a bit, so I am not doing much of it here. It is also highly context dependent.

spotify_songs %>% 
  ggplot(aes(playlist_genre, track_popularity, fill = playlist_genre)) +
  geom_violin() +
  stat_summary(color = "white", show.legend = FALSE) +
  fishualize::scale_fill_fish_d() +
  guides(fill = "none")

spotify_songs %>% 
  ggplot(aes(energy, tempo)) +
  geom_hex() +
  theme_minimal() +
  scale_fill_viridis_c()

There was question on coloring something based on 2 instead of one feature. This is not available out of the box, but we can hack something together. The function hsv generates colors from hue, saturation and value, so we can add a new column to our data that contains colors constructed by us based on some other columns. We can then tell ggplot got use the actual values of the color column as colors (instead of creating it’s own color scale) by adding scale_color_itentidy.

LS0tCnRpdGxlOiAiU29sdXRpb25zIDA2IgphdXRob3I6ICJKYW5uaWsgQnVociIKZGF0ZTogIjI1LzExLzIwMjEiCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgdG9jOiB0cnVlCiAgICBjb2RlX2Rvd25sb2FkOiB0cnVlCiAgICBkZl9wcmludDogcGFnZWQKLS0tCgpgYGB7ciBzZXR1cCwgaW5jbHVkZT1GQUxTRX0Ka25pdHI6Om9wdHNfY2h1bmskc2V0KGVjaG8gPSBUUlVFLCBtZXNzYWdlID0gRkFMU0UsIHdhcm5pbmcgPSBGQUxTRSkKYGBgCgojIFNldHVwCgpgYGB7cn0KbGlicmFyeSh0aWR5dmVyc2UpCmxpYnJhcnkodGlkeW1vZGVscykKCnRoZW1lX3NldCh0aGVtZV9taW5pbWFsKCkpCmBgYAoKYGBge3J9CnNwb3RpZnlfc29uZ3MgPC0gcmVhZF9jc3YoJ2h0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9yZm9yZGF0YXNjaWVuY2UvdGlkeXR1ZXNkYXkvbWFzdGVyL2RhdGEvMjAyMC8yMDIwLTAxLTIxL3Nwb3RpZnlfc29uZ3MuY3N2JykgJT4lIAogIG11dGF0ZShkYXRlID0gbHVicmlkYXRlOjp5bWQodHJhY2tfYWxidW1fcmVsZWFzZV9kYXRlKSkgJT4lIAogIG11dGF0ZSh5ZWFyID0gbHVicmlkYXRlOjp5ZWFyKGRhdGUpKQoKc3BvdGlmeV9zb25ncwpgYGAKCiMgRURBIChleHBsb3JhdG9yeSBkYXRhIGFuYWx5c2lzKQoKV2Ugc3RhcnQgb2YgYnkgY291bnRpbmcgdGhpbmdzLiBIb3cgbWFueSBnZW5yZXMgYXJlCnRoZXJlPyBIb3cgbXVjaCBkYXRhIGRvIHdlIGhhdmUgb24gZWFjaCBvZiB0aG9zZT8KCmBgYHtyfQpzcG90aWZ5X3NvbmdzICU+JSAKICBjb3VudChwbGF5bGlzdF9nZW5yZSkKYGBgCgpIb3cgbWFueSBzdWJnZW5yZXM/CgpgYGB7cn0Kc3BvdGlmeV9zb25ncyAlPiUgCiAgY291bnQocGxheWxpc3RfZ2VucmUsIHBsYXlsaXN0X3N1YmdlbnJlKQpgYGAKCk5leHQgd2UganVtcCBpbnRvIGV4cGxvcmluZyB0aGUgZGlzdHJpYnV0aW9ucyBvZgpkaWZmZXJlbnQgZmVhdHVyZXMuCgpgYGB7cn0Kc3BvdGlmeV9zb25ncyAlPiUgCiAgZ2dwbG90KGFlcyh2YWxlbmNlLCBmaWxsID0gcGxheWxpc3RfZ2VucmUpKSArCiAgZ2VvbV9kZW5zaXR5KGFscGhhID0gMC4zKSArCiAgc2NhbGVfeV9jb250aW51b3VzKGV4cGFuZCA9IGV4cGFuc2lvbihtdWx0ID0gYygwLCAwLjEpKSkKYGBgCgpCeSBicmVha2luZyB3aXRoIHRoZSB0aWR5IGRhdGEgcHJpbmNpcGxlIGluIGEgY2xldmVyIHdheQp3ZSBjYW4gdmlzdWFsaXplIHRoZSBkaXN0cmlidXRpb25zIGZvciBhbGwgb3VyIG51bWVyaWNhbApmZWF0dXJlcy4KCmBgYHtyLCBmaWcud2lkdGg9OH0Kc3BvdGlmeV9zb25ncyAlPiUgCiAgcGl2b3RfbG9uZ2VyKHdoZXJlKGlzLm51bWVyaWMpKSAlPiUgCiAgZ2dwbG90KGFlcyh2YWx1ZSwgZmlsbCA9IHBsYXlsaXN0X2dlbnJlKSkgKwogIGdlb21fZGVuc2l0eShhbHBoYSA9IDAuMikgKwogIGZhY2V0X3dyYXAofiBuYW1lLCBzY2FsZXMgPSAiZnJlZSIpICsKICBndWlkZXMoZmlsbCA9IGd1aWRlX2xlZ2VuZCh0aXRsZSA9ICJHZW5yZSIpKSArCiAgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gYygxLCAwKSwKICAgICAgICBsZWdlbmQuanVzdGlmaWNhdGlvbiA9IGMoMSwgMCksCiAgICAgICAgbGVnZW5kLmRpcmVjdGlvbiA9ICJob3Jpem9udGFsIikKYGBgCgpSb2NrIG11c2ljIGhhcyB0byBzdGVwIHVwIGl0J3MgZGFuY2luZyBnYW1lLCBJIHN1cHBvc2UuCkFuZCB0aGUgaXMgYSBmYXZvcml0ZSB0ZW1wbyBmb3IgRURNLgoKSXQgaXMgb2Z0ZW4gaW50ZXJlc3RpbmcgdG8gZXhwbG9yZSBjaGFuZ2VzIG92ZXIgdGltZS4gCgpgYGB7cn0Kc3BvdGlmeV9zb25ncyAlPiUgCiAgZ2dwbG90KGFlcyh5ZWFyLCBsb3VkbmVzcykpICsKICBnZW9tX3BvaW50KCkgKwogIGdlb21fc21vb3RoKCkKYGBgCgoKYGBge3J9CnNwb3RpZnlfc29uZ3MgJT4lIAogIGdncGxvdChhZXMoeWVhciwgdHJhY2tfcG9wdWxhcml0eSkpICsKICBnZW9tX3BvaW50KCkgKwogIGdlb21fc21vb3RoKCkKYGBgCgpBbGwgb2YgdGhpcyBpcyBhbiBpdGVyYXRpdmUgcHJvY2VzcywKc28gaWYgd2UgZmluZCBhIHBsb3Qgd2UgZmluZCBpbnRlcmVzdGluZyB3ZQpjYW4gc3BlbmQgc29tZSBtb3JlIHRpbWUgbWFraW5nIGl0IHByZXR0eQpieSBhZGRpbmcgbGFiZWxzLCB0aXRsZXMgY29sb3JzIGFuZCB0aGVtaW5nLgpUaGlzIHByb2Nlc3MgY2FuIHRha2UgYXMgbG9uZyBhcyB5b3Ugd2FudCwKc29tZXRpbWVzIHlvdSBoYXZlIHRvIGV4cGVyaW1lbnQgYSBiaXQsCnNvIEkgYW0gbm90IGRvaW5nIG11Y2ggb2YgaXQgaGVyZS4KSXQgaXMgYWxzbyBoaWdobHkgY29udGV4dCBkZXBlbmRlbnQuCgpgYGB7cn0Kc3BvdGlmeV9zb25ncyAlPiUgCiAgZ2dwbG90KGFlcyhwbGF5bGlzdF9nZW5yZSwgdHJhY2tfcG9wdWxhcml0eSwgZmlsbCA9IHBsYXlsaXN0X2dlbnJlKSkgKwogIGdlb21fdmlvbGluKCkgKwogIHN0YXRfc3VtbWFyeShjb2xvciA9ICJ3aGl0ZSIsIHNob3cubGVnZW5kID0gRkFMU0UpICsKICBmaXNodWFsaXplOjpzY2FsZV9maWxsX2Zpc2hfZCgpICsKICBndWlkZXMoZmlsbCA9ICJub25lIikKYGBgCgoKYGBge3IsIGZpZy5hc3A9MSwgZmlnLndpZHRoPTgsIGRwaT0zMDB9CnNwb3RpZnlfc29uZ3MgJT4lIAogIGdncGxvdChhZXMoZW5lcmd5LCB0ZW1wbykpICsKICBnZW9tX2hleCgpICsKICB0aGVtZV9taW5pbWFsKCkgKwogIHNjYWxlX2ZpbGxfdmlyaWRpc19jKCkKYGBgCgpUaGVyZSB3YXMgcXVlc3Rpb24gb24gY29sb3Jpbmcgc29tZXRoaW5nIGJhc2VkCm9uIDIgaW5zdGVhZCBvZiBvbmUgZmVhdHVyZS4KVGhpcyBpcyBub3QgYXZhaWxhYmxlIG91dCBvZiB0aGUgYm94LApidXQgd2UgY2FuIGhhY2sgc29tZXRoaW5nIHRvZ2V0aGVyLgpUaGUgZnVuY3Rpb24gYGhzdmAgZ2VuZXJhdGVzIGNvbG9ycwpmcm9tIGh1ZSwgc2F0dXJhdGlvbiBhbmQgdmFsdWUsIHNvCndlIGNhbiBhZGQgYSBuZXcgY29sdW1uIHRvIG91ciBkYXRhCnRoYXQgY29udGFpbnMgY29sb3JzIGNvbnN0cnVjdGVkIGJ5CnVzIGJhc2VkIG9uIHNvbWUgb3RoZXIgY29sdW1ucy4KV2UgY2FuIHRoZW4gdGVsbCBnZ3Bsb3QgZ290IHVzZQp0aGUgYWN0dWFsIHZhbHVlcyBvZiB0aGUgY29sb3IgY29sdW1uCmFzIGNvbG9ycyAoaW5zdGVhZCBvZiBjcmVhdGluZyBpdCdzIG93biBjb2xvciBzY2FsZSkKYnkgYWRkaW5nIGBzY2FsZV9jb2xvcl9pdGVudGlkeWAuCgo=